什么是 hard negative mining (难分样本挖掘)
What is Hard Nagetive Mining
Let’s say I give you a bunch of images that contain one or more people, and I give you bounding boxes for each one. Your classifier will need both positive training examples (person) and negative training examples (not person).
For each person, you create a positive training example by looking inside that bounding box. But how do you create useful negative examples?
A good way to start is to generate a bunch of random bounding boxes, and for each that doesn’t overlap with any of your positives, keep that new box as a negative.
Ok, so you have positives and negatives, so you train a classifier, and to test it out, you run it on your training images again with a sliding window. But it turns out that your classifier isn’t very good, because it throws a bunch of false positives (people detected where there aren’t actually people).
A hard negative is when you take that falsely detected patch, and explicitly create a negative example out of that patch, and add that negative to your training set. When you retrain your classifier, it should perform better with this extra knowledge, and not make as many false positives.
https://www.reddit.com/r/computervision/comments/2ggc5l/what_is_hard_negative_mining_and_how_is_it/
简单总结
给定一堆包含一个或多个人的图像,给图像中每个人一个边框。
通过查看边框来创建正例,但是负例呢?可以通过生成一堆随机边框,将其中不与正例边框重合的边框保留为负例,就得到了负例。
有了正例和负例,就可以训练分类器,再一次利用滑动窗口在训练集上运行分类器去测试分类器。这个时候的分类器并不很好,因为它会抛出一堆假正例(没有与人重叠,但是被标记为正例)。
hard negative就是把这些假正例明确的作为负例添加到训练集中,再一次训练分类器,它的表现就会更好。重复操作直到假正例越来越少。